Area and time efficient implementations of matrix multiplication on FPGAs

نویسندگان

  • Ju-wook Jang
  • Seonil B. Choi
  • Viktor K. Prasanna
چکیده

We develop new algorithms and architectures for matrix multiplication on configurable hardware. These designs significantly reduce the latency as well as the area. Our designs improve the previous designs in [7] and [1] in terms of the area/speed metric where the speed denotes the maximum achievable running frequency. The area/speed metrics for the designs in [7], [1], and our design are 14.45, 4.93, and 2.35, respectively, for 4 × 4 matrix multiplication. The latency of the design in [7] is 0.57μs, while our design takes 0.15μs using 18% less area. The area of our designs is smaller by 11%−46% compared with the best known systolic designs based on [9] with the same latency for the matrices of sizes 3×3−12×12. The performance improvements tend to grow with the problem size.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Energy Performance of Floating-Point Matrix Multiplication on FPGAs

Floating-point matrix multiplication is a basic kernel in scientific computing. It has been shown that implementations of this kernel on FPGAs can achieve high sustained performance [1]. However, to the best of our knowledge, existing work on FPGA-based floating-point matrix multiplication considers the optimization of latency or area only. In this paper, we analyze the impact of various parame...

متن کامل

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Area-Time-Power of Modular Multipliers implemented in FPGA

Three modular multiplication algorithms are described and compared: the so-called Multiply and Reduce, the Shift and Add, and finally, the Montgomery product. An estimation of the cost of their combinational implementation using Xilinx FPGAs family is calculated. Practical results in term of area, delay, and power for both combinational and completely sequential implementations are presented.

متن کامل

Area/performance trade-off analysis of an FPGA digit-serial GFð2Þ Montgomery multiplier based on LFSR

Montgomery Multiplication is a common and important algorithm for improving the efficiency of public key cryptographic algorithms, like RSA and Elliptic Curve Cryptography (ECC). A natural choice for implementing this time consuming multiplication defined on finite fields, mainly over GFð2Þ, is the use of Field Programmable Gate Arrays (FPGAs) for being reconfigurable, flexible and physically s...

متن کامل

A Model-Based Methodology for Application Specific Energy Efficient Data Path Design Using FPGAs

We present a methodology to design energy-efficient data paths using FPGAs. Our methodology integrates domain specific modeling, coarse-grained performance evaluation, design space exploration, and low level simulation to understand the tradeoffs between energy, latency, and area. The domain specific modeling technique defines a high-level model by identifying various components and parameters ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002